Crowdsourced Feedback to Improve Resident Physician Error Disclosure Skills

Key Points Question Is crowdsourced feedback from laypeople an effective educational intervention to improve resident physicians’ error disclosure communication skills? Findings In this randomized clinical trial including 146 second-year internal medicine and family medicine residents, practice with simulation software followed by feedback from crowdsourced laypeople was associated with a modest increase in communication ratings on a 5-point scale among residents who reviewed their feedback. Meaning The findings from this trial suggest that self-directed review of crowdsourced feedback is an effective way for residency programs to address their requirement to prepare trainees for communicating with patients after medical harm.


Introduction
2][3][4] Incomplete or poor physician communication magnifies the pain and uncertainty experienced by patients and impairs efforts to improve patient safety. 5,6To better prepare physicians, the Accreditation Council for Graduate Medical Education requires that all residents receive training and practice in adverse event disclosure to patients. 7However, 23% of US residencies provided no such training in 2021. 8Most other programs provided only informal training or lectures, approaches that are necessary but likely insufficient.0][11][12] To supplement lectures and bedside learning, educators need practical tools for residents to practice simulated medical error disclosure and receive reliable, patient-centered formative feedback.The video-based communication assessment (VCA) is software for this purpose, but limited evidence exists regarding its effectiveness.
The VCA provides physicians with practice and feedback on their communication skills. 13It presents videos of vignettes and prompts users to audio-record what they would say to the patient.
Recorded responses are rated by web-based panels of laypeople responding as if they were the patient in the scenario. 14The laypeople are recruited via Amazon Mechanical Turk (MTurk), a crowdsourcing website with a large and diverse participant population. 15,16Physicians receive feedback reports with summary ratings of their performance, average peer scores, learning points derived from raters' comments, and audio of highly rated peer responses.8][19] First, learning points reinforce desired behaviors and help learners to reconstruct task knowledge around the approach desired by patients.Second, listening to exemplars aids the conceptualization of ideal performance on specific communication subtasks.For example, cases are organized around challenging questions raised by patients that physicians may struggle to address without training and practice 1 (eg, "Why did this happen?"or "Who is going to pay for this care?").Third, personal ratings help learners to gauge relative performance and determine areas for further practice.
In prior studies, 20,21 the VCA proved highly acceptable and feasible for preparing learners for common communication scenarios, and raters generated high-quality, actionable feedback.For VCA cases presenting harmful medical errors, panels of crowdsourced laypeople provided ratings that were consistent with those of patients with personal experience with harmful error. 22In a single-site pre-post pilot study involving paid resident volunteers from 3 specialties, standalone VCA practice without a didactic curriculum was associated with an increase in ratings of residents' error disclosure skills. 23Because the effectiveness of the VCA has not been assessed, we sought to test the effect of formative feedback delivered by VCA with a large multisite cohort as part of an error disclosure curriculum.This article describes a randomized clinical trial to test the hypothesis that residents' error disclosure skills, as assessed by laypeople, would improve after reviewing reports with personal performance feedback and recommendations for effective error disclosure.

Error Disclosure Training
Programs assigned all eligible PGY2 residents to attend a 75-minute teaching session at time 1, consisting of 50 minutes of lecture about communication with patients after medical errors, 20 minutes of VCA practice with 2 cases (containing 4 and 3 sequenced vignettes, respectively), and 5 minutes of debrief.At time 2, residents attended a session consisting of 25 minutes of lecture about institutional programs to support clinicians with error disclosure and 20 minutes of VCA practice with 2 additional cases (3 sequenced vignettes each).The recommended duration between time 1 and time 2 was 4 weeks, although the conference schedule at 2 residencies required an interval of 5 to 8 weeks for some residents.The training took place during regularly scheduled conferences for PGY2 residents.The lectures were delivered over video conference by investigators experienced with communication skills training (A.A.W. and T.H.G.).The lecture was adapted from published curricula and modified to highlight site-specific event review policies and clinician support systems. 25,26Residents were encouraged to complete the VCA during the allocated conference time, but could complete it within 5 days if necessary.The study ended when all teaching conferences organized by programs had concluded.

Intervention
Residents who completed the VCA at time 1 were randomized in 1:1 fashion to either receive feedback before time 2 (intervention) or after time 2 (control) (Figure 1).Block randomization was performed centrally in variable block sizes, before time 1 responses were scored, by a coinvestigator (A.E.D.) with access to lists of the nonidentifying coded usernames of residents who completed time 1.
Investigators and raters were blinded to assignments.Residents were unblinded after feedback was released.Intervention residents received automated emails when their feedback was available, instructing them to review it in the application (app) before the next teaching session and VCA practice.Feedback was typically provided 2 weeks after VCA use to allow for completion of rating and JAMA Network Open | Medical Education data quality checks.Reports presented an interactive feedback display within the VCA app for each vignette (Figure 2).We asked residents receiving the intervention not to discuss feedback with colleagues to avoid contamination.

VCA App and Cases
The VCA app used in this study has been described previously. 13,18Users entered the app with a personal login and password to access vignettes or review feedback.This study used 4 cases, including 2 previously described cases (a delayed diagnosis of breast cancer and an anticoagulant overdose). 20,21We created 2 new cases depicting a delayed diagnosis of sepsis and the development of a pressure sore (eTable 1 in Supplement 2).The cases were tested and refined with feedback from 6 faculty members in IM or FM to improve relevance, clarity, and believability.We designed all cases to reflect serious safety events of equivalent preventability and harm severity.Professional actors portrayed each patient or family member.

Audio Collection and Rating
Residents provided audio responses to each vignette through the VCA software.Audio responses were bundled into rating tasks on MTurk for raters who were US residents aged 18 years or older and able to speak and read English.Raters answered demographic questions, read a vignette description in lay language, viewed the patient video, and listened to resident responses.They rated each response on 6 items covering domains related to accountability, honesty, apology, empathy, caring, and overall response, using a previously described instrument. 21Items used a 5-point scale anchored with the labels poor, fair, good, very good, and excellent.After rating a set of responses, the rater responded in free text to the question, "What would you want the provider to say if you were the patient in this situation?"A power analysis based on previous research 23 with a moderate η p 2 of 0.09, determined that a sample of 96 PGY2 residents was needed to achieve a power of 0.85 at α = .05 for the analysis of covariance (ANCOVA) to effectively test the study hypotheses.
We sought at least 6 raters per response after removing raters with indications of low contributions to reliability. 27To eliminate inattentive raters from quantitative analysis, open-ended responses were analyzed for quality.One analyst reviewed all responses and flagged responses that bypassed the question (eg, none, good, or NA [not applicable]), were generic, repetitive for multiple vignettes, or were copied and pasted from the ratings task questions (eg, "the provider understood how I was feeling").A second analyst reviewed and confirmed all exclusions.

Resident Surveys
Residents completed questionnaires in the VCA application before proceeding to cases.The survey at time 1 asked about age, gender, race, the number of times the resident had personally participated in disclosure of a harmful error to a patient or family, and their highest level of involvement during disclosure of a harmful medical error.Data on race were included in this study because this information would be valuable for future analyses to address racial concordance between users and raters.Before time 2, residents who had received feedback were asked, "Approximately how many minutes did you spend reviewing your feedback?"(response options in 5-minute ranges), "How many of your own responses did you replay?", and "How many of the exemplar (highly rated peer) responses did you play?" (response options of 0, 1-2, 3-4, and Ն5).Residents responded to 4 additional items about the usefulness of each feedback component (scores, personal recordings, exemplar recordings, and learning points) using a 5-point scale with labels from not at all to extremely.

Statistical Analysis
Data analysis was performed from July to December 2023.We averaged ratings across items and raters to create an overall rating of each response.We then averaged response ratings across all 7 vignettes at time 1 to create an overall time 1 score, and across all 6 vignettes at time 2 to create a time 2 score.We created a dichotomous disclosure exposure variable by combining disclosure involvement level and the number of times participated in disclosure.
To address our primary study question about the effect of the intervention (ie, access to VCA feedback), we conducted a factorial ANCOVA examining the impact that the intervention and prior disclosure exposure had on the primary outcome, time 2 scores, while adjusting for time 1 scores.We conducted a modified intention-to-treat analysis, including all residents with both time 1 and time 2 data.However, those who did not complete time 2 were necessarily excluded from analysis because they did not provide data for the main outcome.Post hoc tests examining the difference between the intervention and control group for each level of prior disclosure exposure were conducted using the Bonferroni correction.We used a Wilcoxon rank sum test to compare performance across specialties on overall scores.We used logistic regression to investigate whether time 1 scores were associated with the likelihood that participants returned for time Prefer not to say 1 (1.9) 1 (2.b Other was a choice that could be picked by the user by clicking the button next to that text if they felt that was the description of their race that was the best response. in Supplement 2).After removing inattentive raters, each response was rated by 6

Communication Rating Outcomes
Figure 3 displays the distribution of crowdsourced ratings by intervention assignment (eTable 3 in Supplement 2 presents time 1 ratings).High performers were rated 2 points higher than low performers on a 5-point scale.The ANCOVA model, which included time 1 scores as a covariate, showed a significant main effect of the intervention; the mean (SD) time 2 overall scores were 3.26 (0.45) for the intervention group and 3.14 (0.39) for the control group (difference, 0.12; 95% CI, 0.08-0.48;η p 2 = 0.04; P = .01).We also detected a significant interaction between the intervention (ie, feedback availability) and prior exposure to disclosure conversation (η p 2 = 0.05; P = .03)after adjusting for time

Discussion
This multisite, randomized clinical trial found that using VCA to provide crowdsourced feedback to PGY2 IM and FM residents about error disclosure skills was associated with an improvement in these skills.Feedback was most impactful among residents who reported they had not been exposed to error disclosure in clinical care, suggesting this intervention could be particularly beneficial at an earlier phase of training.Our findings highlight the potential for the VCA as a scalable practice tool for training that would be logistically challenging to replicate with standardized patients.
Despite these encouraging findings, surveys revealed that many residents either did not review or spent minimal time reviewing their feedback, which likely blunted the intervention's effect.To optimize the VCA's efficacy, future research should investigate and resolve barriers to residents' use of crowdsourced feedback.Possible barriers in this trial included the delay between practice and feedback, the lack of protected time to review feedback, a need for adjunctive coaching, unidentified shortcomings of the feedback content and presentation, or the need for more practice repetitions.
If confirmed, some of these potential barriers can be addressed with technical or curricular changes, such as providing dedicated time for feedback review or a paired faculty coach.However, using crowdsourcing to incorporate the layperson's voice in statistically reliable feedback currently requires at least 2 to 3 days, making it difficult to provide instantaneous results.
July 2022 through May 2023, we conducted a single-blinded, multicenter, randomized clinical trial of the effect of crowdsourced ratings and feedback on postgraduate year 2 (PGY2) internal medicine (IM) and family medicine (FM) resident physicians' medical error communication skills (see the trial protocol in Supplement 1).The University of Washington institutional review board ruled this study exempt from review.Participants were not compensated.No VCA results were shared with residency faculty.Risks and benefits were explained verbally; participation was considered to indicate consent.Residents could participate in the training and opt out of research.This report follows the Consolidated Standards of Reporting Trials (CONSORT) reporting guideline for randomized studies. 24Setting and Participants Participants attended IM and FM residencies at 7 US academic medical centers: University of Washington, Seattle (IM and FM); University of Washington, Boise (IM); Washington State University, Everett (IM); Beaumont University (IM at Dearborn and Royal Oak, FM at Wayne and Troy); Dartmouth-Hitchcock Medical Center (IM); University of Massachusetts, Worcester (IM); and Washington University, St. Louis (IM).Each residency participated during a 4-to 8-week window chosen by program leaders to optimize PGY2 residents' availability.Before the study, none of the residencies provided programwide required error disclosure training.We chose IM and FM residencies because of their large size and shared familiarity with medical cases involving adults.We enrolled only PGY2 residents to control for years of training and simplify scheduling.Residents were eligible for the study if they were on any clinical or nonclinical rotation that provided protected time to attend the teaching conference chosen by their program for VCA practice.Residents were not eligible if they were on leave at the time of the study.

Figure 2 .
Figure 2. Video-Based Communication Assessment Feedback Interface for 1 Vignette Within an Error Disclosure Case

a
Ten participants who completed time 1 but not time 2 were omitted because of insufficient numbers of ratings.

Figure 3 .
Figure 3. Violin Plot of Participants' Video-Based Communication Assessment Scores by Study Group5

Table 1 .
Characteristics of Resident Physicians 2. All statistical analysis was performed in R statistical software version 4.1.2(R Project for Statistical Computing), with a 2-sided P < .05,except with ANCOVA, which is inherently 1-sided.
to 18 laypeople (mean [SD], 9.50[1.60]individuals).The 53 participants in the intervention group completed surveys about interacting with the VCA feedback available before time 2. Two surveys lacked data because of electronic storage errors.Of the 51 residents with survey data, 28 (54.9%) reported that they had reviewed their feedback before the survey, reporting variable total periods of time in review; 7 (13.7%)spent less than 5 minutes, 12

Table 2 .
Survey Responses of Resident Physicians Randomized to Receive Crowdsourced Feedback About Error Disclosure Skills a Responses sum to 51 rather than 53 because a data transfer issue caused data loss from 2 completed surveys.Those who did not report reviewing feedback at the time of the survey could have reviewed it at a later time before time 2.

eTable 2 .
Demographics of laypeople crowdsourced via Amazon Mechanical Turk (MTurk) who provided attentive ratings of resident adverse event communication skills eTable 3. VCA ratings at Time 1 (baseline) eTable 4. ANCOVA table for the impact of feedback and disclosure exposure on time 2 scores